• Steven Ponce
  • About
  • Data Visualizations
  • Projects
  • Resume
  • Email

On this page

  • Steps to Create this Graphic
    • 1. Load Packages & Setup
    • 2. Read in the Data
    • 3. Examine the Data
    • 4. Tidy Data
    • 5. Visualization Parameters
    • 6. Plot
    • 7. Save
    • 8. Session Info
    • 9. GitHub Repository
    • 10. References

Relationship Between Peak Hour and Daily Traffic

  • Show All Code
  • Hide All Code

  • View Source

Major corridors in Los Angeles and Orange counties show higher traffic volumes than Bay Area routes

30DayChartChallenge
Data Visualization
R Programming
2025
Analysis of California traffic data showing the strong linear relationship between peak hour traffic volume and annual average daily traffic across different counties. This visualization, created for Day 18 of the #30DayChartChallenge with an El País-inspired design, reveals how traffic patterns vary between major urban areas.
Published

April 18, 2025

Figure 1: Scatter plot showing the relationship between peak hour traffic volume (x-axis) and annual average daily traffic (y-axis) across California counties. Los Angeles (dark blue) and Orange County (medium blue) data points generally show higher traffic volumes than other Bay Area counties (light blue). A trend line indicates daily traffic is approximately 10.5 times peak hour volume, with points clustering closely along this line.

Steps to Create this Graphic

1. Load Packages & Setup

Show code
## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
pacman::p_load(
  tidyverse,      # Easily Install and Load the 'Tidyverse'
  ggtext,         # Improved Text Rendering Support for 'ggplot2'
  showtext,       # Using Fonts More Easily in R Graphs
  janitor,        # Simple Tools for Examining and Cleaning Dirty Data
  skimr,          # Compact and Flexible Summaries of Data
  scales,         # Scale Functions for Visualization
  lubridate,      # Make Dealing with Dates a Little Easier
  camcorder       # Record Your Plot History
  )
})

### |- figure size ----
gg_record(
    dir    = here::here("temp_plots"),
    device = "png",
    width  = 8,
    height = 8,
    units  = "in",
    dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))

2. Read in the Data

Show code
traffic_volumnes_raw <- read_csv(here::here(
  'data/30DayChartChallenge/2025/Traffic_Volumes_AADT.csv')
  ) |>
    clean_names()

3. Examine the Data

Show code
glimpse(traffic_volumnes_raw)
skim(traffic_volumnes_raw)

4. Tidy Data

Show code
### |- Tidy ----
simple_traffic <- traffic_volumnes_raw |>                   
  select(county, route, back_peak_hour, back_aadt) |>
  filter(!is.na(back_peak_hour) & !is.na(back_aadt)) |>
  # Major counties with substantial data
  filter(county %in% c("LA", "SD", "ORA", "SF", "SCL", "ALA", "CC")) |>
  # Routes with higher traffic
  filter(back_aadt > 50000) |>
  # Remove extreme outliers (values beyond 3x the standard deviation)
  filter(
    back_peak_hour < mean(back_peak_hour) + 3*sd(back_peak_hour),
    back_aadt < mean(back_aadt) + 3*sd(back_aadt)
  ) |>
  mutate(
    peak_ratio = back_peak_hour / (back_aadt/24),
    county_group = case_when(
      county == "LA" ~ "Los Angeles",
      county == "ORA" ~ "Orange",
      TRUE ~ "Other Bay Area Counties"
    )
  )

5. Visualization Parameters

Show code
### |-  plot aesthetics ----
colors <- get_theme_colors(
  palette = c(
    "Los Angeles" = "#1D3557",  
    "Orange" = "#457B9D",      
    "Other Bay Area Counties" = "#A8DADC" ,
    NULL = "#A8DADC40"
  )
)

### |-  titles and caption ----
# text
title_text    <- str_glue("Relationship Between Peak Hour and Daily Traffic")

subtitle_text <- str_glue("Major corridors in Los Angeles and Orange counties show higher traffic volumes than Bay Area routes")

caption_text <- create_dcc_caption(
  dcc_year = 2025,
  dcc_day = 18,
  source_text =  "California Department of Transportation via data.gov" 
)

### |-  fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----

# El País theme (at least my interpretation)
el_pais_theme <- function() {
  theme_minimal(base_size = 14) +
    theme(
      # Typography
      text = element_text(family = "Helvetica", color = "#333333"),
      plot.title = element_text(size = rel(1), face = "bold", hjust = 0, margin = margin(b = 10)),
      plot.subtitle = element_text(size = rel(0.79), color = "#666666", hjust = 0, margin = margin(b = 20)),
      
      # Axis styling
      axis.title = element_text(size = rel(0.71), color = "#666666"),
      axis.text = element_text(size = rel(0.64), color = "#333333"),
      axis.line = element_line(color = "black", linewidth = 0.5),
      
      # Grid styling
      panel.grid.major = element_line(color = "#f0f0f0", linewidth = 0.5),
      panel.grid.minor = element_blank(),
      
      # Legend styling
      legend.title = element_text(size = rel(0.71)),
      legend.text = element_text(size = rel(0.64)),
      legend.position = c(0.01, 1),
      legend.justification = c(0, 1),
      legend.background = element_rect(fill = "white", color = NA),
      legend.key.size = unit(1.2, "lines"),
      legend.margin = margin(t = 0, r = 10, b = 5, l = 0),
      
      # Margins & Others
      plot.margin = margin(t = 20, r = 20, b = 15, l = 20),
      plot.background = element_rect(fill = 'white', color = 'white'),
      panel.background = element_rect(fill = 'white', color = 'white')
    )
}

6. Plot

Show code
### |-  Plot ----
p <- ggplot(simple_traffic, aes(x = back_peak_hour, y = back_aadt)) +
  # Geoms
  geom_point(
    aes(color = county_group),
    alpha = 0.8,
    size = 2.5
  ) +
  geom_smooth(
    method = "lm",
    formula = y ~ x,
    color = colors$palette[1],  
    fill = colors$palette[5],  
    size = 1,
    fullrange = FALSE  # Only draw the trend line within the range of actual data
  ) +
  # Annotate
  annotate(
    "text",
    x = max(simple_traffic$back_peak_hour) * 0.3,
    y = max(simple_traffic$back_aadt) * 0.9,
    label = "Daily Traffic ≈ 10.5 × Peak Hour Volume",
    color = colors$palette[1], 
    fontface = "italic",
    size = 3.5
  ) +
  # Scales
  scale_color_manual(
    values = colors$palette,
    name = "County"
  ) +
  scale_y_continuous(
    labels = function(x) paste0(x/1000, "K"),
    name = "Annual Average Daily Traffic (vehicles)",
    limits = c(0, max(simple_traffic$back_aadt) * 1.05)
  ) +
  scale_x_continuous(
    labels = function(x) paste0(x/1000, "K"),
    name = "Peak Hour Traffic Volume (vehicles)",
    limits = c(0, max(simple_traffic$back_peak_hour) * 1.05)
  ) +
  # Labs
  labs(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text
  ) +
  # Theme
  el_pais_theme() +
  theme(
    plot.caption = element_markdown(
      size = rel(0.65),
      family = fonts$caption,
      color = colors$caption,
      hjust = 0.5,
      margin = margin(t = 10)
    )
  )

7. Save

Show code
### |-  plot image ----  

save_plot(
  p, 
  type = "30daychartchallenge", 
  year = 2025, 
  day = 18, 
  width = 8, 
  height = 8
  )

8. Session Info

Expand for Session Info
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] here_1.0.1      camcorder_0.1.0 scales_1.3.0    skimr_2.1.5    
 [5] janitor_2.2.0   showtext_0.9-7  showtextdb_3.0  sysfonts_0.8.9 
 [9] ggtext_0.1.2    lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1  
[13] dplyr_1.1.4     purrr_1.0.2     readr_2.1.5     tidyr_1.3.1    
[17] tibble_3.2.1    ggplot2_3.5.1   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.6      xfun_0.49         htmlwidgets_1.6.4 lattice_0.22-6   
 [5] tzdb_0.4.0        vctrs_0.6.5       tools_4.4.0       generics_0.1.3   
 [9] curl_6.0.0        parallel_4.4.0    gifski_1.32.0-1   fansi_1.0.6      
[13] pacman_0.5.1      pkgconfig_2.0.3   Matrix_1.7-0      lifecycle_1.0.4  
[17] farver_2.1.2      compiler_4.4.0    textshaping_0.4.0 munsell_0.5.1    
[21] repr_1.1.7        codetools_0.2-20  snakecase_0.11.1  htmltools_0.5.8.1
[25] yaml_2.3.10       crayon_1.5.3      pillar_1.9.0      magick_2.8.5     
[29] nlme_3.1-164      commonmark_1.9.2  tidyselect_1.2.1  digest_0.6.37    
[33] stringi_1.8.4     labeling_0.4.3    splines_4.4.0     rsvg_2.6.1       
[37] rprojroot_2.0.4   fastmap_1.2.0     grid_4.4.0        colorspace_2.1-1 
[41] cli_3.6.3         magrittr_2.0.3    base64enc_0.1-3   utf8_1.2.4       
[45] withr_3.0.2       bit64_4.5.2       timechange_0.3.0  rmarkdown_2.29   
[49] bit_4.5.0         ragg_1.3.3        hms_1.1.3         evaluate_1.0.1   
[53] knitr_1.49        markdown_1.13     mgcv_1.9-1        rlang_1.1.4      
[57] gridtext_0.1.5    Rcpp_1.0.13-1     glue_1.8.0        xml2_1.3.6       
[61] renv_1.0.3        svglite_2.1.3     rstudioapi_0.17.1 vroom_1.6.5      
[65] jsonlite_1.8.9    R6_2.5.1          systemfonts_1.1.0

9. GitHub Repository

Expand for GitHub Repo

The complete code for this analysis is available in 30dcc_2025_18.qmd.

For the full repository, click here.

10. References

Expand for References
  1. Data Sources:
    • California Annual Average Daily Traffic Volumes, Metadata Updated: November 27, 2024 data.gov
Back to top
Source Code
---
title: "Relationship Between Peak Hour and Daily Traffic"
subtitle: "Major corridors in Los Angeles and Orange counties show higher traffic volumes than Bay Area routes"
description: "Analysis of California traffic data showing the strong linear relationship between peak hour traffic volume and annual average daily traffic across different counties. This visualization, created for Day 18 of the #30DayChartChallenge with an El País-inspired design, reveals how traffic patterns vary between major urban areas."
date: "2025-04-18" 
categories: ["30DayChartChallenge", "Data Visualization", "R Programming", "2025"]
tags: [
"ggplot2", "transportation", "urban planning", "El Pais", "traffic analysis", "California", "scatter plot", "relationships", "county comparison", "peak hour traffic"
  ]
image: "thumbnails/30dcc_2025_18.png"
format:
  html:
    toc: true
    toc-depth: 5
    code-link: true
    code-fold: true
    code-tools: true
    code-summary: "Show code"
    self-contained: true
    theme: 
      light: [flatly, assets/styling/custom_styles.scss]
      dark: [darkly, assets/styling/custom_styles_dark.scss]
editor_options: 
  chunk_output_type: inline
execute: 
  freeze: true                                                  
  cache: true                                                   
  error: false
  message: false
  warning: false
  eval: true
# filters:
#   - social-share
# share:
#   permalink: "https://stevenponce.netlify.app/data_visualizations/30DayChartChallenge/2025/30dcc_2025_18.html"
#   description: "Day 18 of #30DayChartChallenge: Exploring the relationship between peak hour and daily traffic volumes across California counties in El País style"
#   twitter: true
#   linkedin: true
#   email: true
#   facebook: false
#   reddit: false
#   stumble: false
#   tumblr: false
#   mastodon: true
#   bsky: true
---

![Scatter plot showing the relationship between peak hour traffic volume (x-axis) and annual average daily traffic (y-axis) across California counties. Los Angeles (dark blue) and Orange County (medium blue) data points generally show higher traffic volumes than other Bay Area counties (light blue). A trend line indicates daily traffic is approximately 10.5 times peak hour volume, with points clustering closely along this line.](30dcc_2025_18.png){#fig-1}

### <mark> **Steps to Create this Graphic** </mark>

#### 1. Load Packages & Setup

```{r}
#| label: load
#| warning: false
#| message: false      
#| results: "hide"     

## 1. LOAD PACKAGES & SETUP ----
suppressPackageStartupMessages({
pacman::p_load(
  tidyverse,      # Easily Install and Load the 'Tidyverse'
  ggtext,         # Improved Text Rendering Support for 'ggplot2'
  showtext,       # Using Fonts More Easily in R Graphs
  janitor,        # Simple Tools for Examining and Cleaning Dirty Data
  skimr,          # Compact and Flexible Summaries of Data
  scales,         # Scale Functions for Visualization
  lubridate,      # Make Dealing with Dates a Little Easier
  camcorder       # Record Your Plot History
  )
})

### |- figure size ----
gg_record(
    dir    = here::here("temp_plots"),
    device = "png",
    width  = 8,
    height = 8,
    units  = "in",
    dpi    = 320
)

# Source utility functions
suppressMessages(source(here::here("R/utils/fonts.R")))
source(here::here("R/utils/social_icons.R"))
source(here::here("R/utils/image_utils.R"))
source(here::here("R/themes/base_theme.R"))
```

#### 2. Read in the Data

```{r}
#| label: read
#| include: true
#| eval: true
#| warning: false

traffic_volumnes_raw <- read_csv(here::here(
  'data/30DayChartChallenge/2025/Traffic_Volumes_AADT.csv')
  ) |>
    clean_names()
```

#### 3. Examine the Data

```{r}
#| label: examine
#| include: true
#| eval: true
#| results: 'hide'
#| warning: false

glimpse(traffic_volumnes_raw)
skim(traffic_volumnes_raw)
```

#### 4. Tidy Data

```{r}
#| label: tidy
#| warning: false

### |- Tidy ----
simple_traffic <- traffic_volumnes_raw |>                   
  select(county, route, back_peak_hour, back_aadt) |>
  filter(!is.na(back_peak_hour) & !is.na(back_aadt)) |>
  # Major counties with substantial data
  filter(county %in% c("LA", "SD", "ORA", "SF", "SCL", "ALA", "CC")) |>
  # Routes with higher traffic
  filter(back_aadt > 50000) |>
  # Remove extreme outliers (values beyond 3x the standard deviation)
  filter(
    back_peak_hour < mean(back_peak_hour) + 3*sd(back_peak_hour),
    back_aadt < mean(back_aadt) + 3*sd(back_aadt)
  ) |>
  mutate(
    peak_ratio = back_peak_hour / (back_aadt/24),
    county_group = case_when(
      county == "LA" ~ "Los Angeles",
      county == "ORA" ~ "Orange",
      TRUE ~ "Other Bay Area Counties"
    )
  )
```

#### 5. Visualization Parameters

```{r}
#| label: params
#| include: true
#| warning: false

### |-  plot aesthetics ----
colors <- get_theme_colors(
  palette = c(
    "Los Angeles" = "#1D3557",  
    "Orange" = "#457B9D",      
    "Other Bay Area Counties" = "#A8DADC" ,
    NULL = "#A8DADC40"
  )
)

### |-  titles and caption ----
# text
title_text    <- str_glue("Relationship Between Peak Hour and Daily Traffic")

subtitle_text <- str_glue("Major corridors in Los Angeles and Orange counties show higher traffic volumes than Bay Area routes")

caption_text <- create_dcc_caption(
  dcc_year = 2025,
  dcc_day = 18,
  source_text =  "California Department of Transportation via data.gov" 
)

### |-  fonts ----
setup_fonts()
fonts <- get_font_families()

### |-  plot theme ----

# El País theme (at least my interpretation)
el_pais_theme <- function() {
  theme_minimal(base_size = 14) +
    theme(
      # Typography
      text = element_text(family = "Helvetica", color = "#333333"),
      plot.title = element_text(size = rel(1), face = "bold", hjust = 0, margin = margin(b = 10)),
      plot.subtitle = element_text(size = rel(0.79), color = "#666666", hjust = 0, margin = margin(b = 20)),
      
      # Axis styling
      axis.title = element_text(size = rel(0.71), color = "#666666"),
      axis.text = element_text(size = rel(0.64), color = "#333333"),
      axis.line = element_line(color = "black", linewidth = 0.5),
      
      # Grid styling
      panel.grid.major = element_line(color = "#f0f0f0", linewidth = 0.5),
      panel.grid.minor = element_blank(),
      
      # Legend styling
      legend.title = element_text(size = rel(0.71)),
      legend.text = element_text(size = rel(0.64)),
      legend.position = c(0.01, 1),
      legend.justification = c(0, 1),
      legend.background = element_rect(fill = "white", color = NA),
      legend.key.size = unit(1.2, "lines"),
      legend.margin = margin(t = 0, r = 10, b = 5, l = 0),
      
      # Margins & Others
      plot.margin = margin(t = 20, r = 20, b = 15, l = 20),
      plot.background = element_rect(fill = 'white', color = 'white'),
      panel.background = element_rect(fill = 'white', color = 'white')
    )
}
```

#### 6. Plot

```{r}
#| label: plot
#| warning: false

### |-  Plot ----
p <- ggplot(simple_traffic, aes(x = back_peak_hour, y = back_aadt)) +
  # Geoms
  geom_point(
    aes(color = county_group),
    alpha = 0.8,
    size = 2.5
  ) +
  geom_smooth(
    method = "lm",
    formula = y ~ x,
    color = colors$palette[1],  
    fill = colors$palette[5],  
    size = 1,
    fullrange = FALSE  # Only draw the trend line within the range of actual data
  ) +
  # Annotate
  annotate(
    "text",
    x = max(simple_traffic$back_peak_hour) * 0.3,
    y = max(simple_traffic$back_aadt) * 0.9,
    label = "Daily Traffic ≈ 10.5 × Peak Hour Volume",
    color = colors$palette[1], 
    fontface = "italic",
    size = 3.5
  ) +
  # Scales
  scale_color_manual(
    values = colors$palette,
    name = "County"
  ) +
  scale_y_continuous(
    labels = function(x) paste0(x/1000, "K"),
    name = "Annual Average Daily Traffic (vehicles)",
    limits = c(0, max(simple_traffic$back_aadt) * 1.05)
  ) +
  scale_x_continuous(
    labels = function(x) paste0(x/1000, "K"),
    name = "Peak Hour Traffic Volume (vehicles)",
    limits = c(0, max(simple_traffic$back_peak_hour) * 1.05)
  ) +
  # Labs
  labs(
    title = title_text,
    subtitle = subtitle_text,
    caption = caption_text
  ) +
  # Theme
  el_pais_theme() +
  theme(
    plot.caption = element_markdown(
      size = rel(0.65),
      family = fonts$caption,
      color = colors$caption,
      hjust = 0.5,
      margin = margin(t = 10)
    )
  )
```

#### 7. Save

```{r}
#| label: save
#| warning: false

### |-  plot image ----  

save_plot(
  p, 
  type = "30daychartchallenge", 
  year = 2025, 
  day = 18, 
  width = 8, 
  height = 8
  )
```

#### 8. Session Info

::: {.callout-tip collapse="true"}
##### Expand for Session Info

```{r, echo = FALSE}
#| eval: true
#| warning: false

sessionInfo()
```
:::

#### 9. GitHub Repository

::: {.callout-tip collapse="true"}
##### Expand for GitHub Repo

The complete code for this analysis is available in [`30dcc_2025_18.qmd`](https://github.com/poncest/personal-website/blob/master/data_visualizations/TidyTuesday/2025/30dcc_2025_18.qmd).

For the full repository, [click here](https://github.com/poncest/personal-website/).
:::


#### 10. References
::: {.callout-tip collapse="true"}
##### Expand for References

1. Data Sources:
   - California Annual Average Daily Traffic Volumes, Metadata Updated: November 27, 2024 [data.gov](https://catalog.data.gov/dataset/traffic-volumes-aadt-ee8d6)
  
:::

© 2024 Steven Ponce

Source Issues